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Abstract 

This  paper  describes  an  efficient  model  to  describe  an  autoregressive  signal  with  slowly- 
varying  amplitude  in  additive  white  Gaussian  noise.  Even  a  simple  low-order  autoregressive 
model  becomes  complicated  by  varying  amplitude  and  additive  white  noise.  However,  by  ap¬ 
proximating  the  signal  amplitude  as  piecewise-constant,  an  efficient  filtering  approach  can  be 
applied  in  order  to  compute  the  maximum  likelihood  estimate  for  the  entire  data  record.  The 
model  is  efficient  both  in  terms  of  havng  a  compact  set  of  parameters  and  in  the  computational 
sense.  Simulation  results  are  provided.  The  algorithm  has  applications  in  signal  modeling  for 
underwater  acoustic  signals,  particularly  active  wideband  signals  such  as  explosive  sources. 


1  Introduction 

In  narrow-band  active  sonar,  the  processing  bandwith  is  too  narrow  to  permit  the  observation  of 
variations  in  the  signal  plus  interference  (SI)  spectrum.  This  is  not  true  in  wide-band  systems 
where  the  power  spectrum  of  interference,  especially  ambient  noise,  may  significantly  differ  from 
the  signal  and  reverberation.  After  the  data  has  been  filtered  to  pre-whiten  the  interference, 
signal  can  be  modeled  as  a  colored  Gaussian  process  with  fixed  spectral  shape,  but  fluctuating 

amplitude,  in  additive  white  Gaussian  noise  (WGN).  Unless  the  signal  and  interference  have  the 
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same  spectral  shape,  the  fluctuating  signal  amplitude  causes  the  SI  spectrum  to  vary.  This  in 
turn  requires  a  high-fidelity  model  to  estimate  the  continually  changing  SI  spectrum,  causing  over¬ 
parameterization.  In  this  paper  we  describe  a  compact  model  that  parameterizes  the  signal  spectum 
as  a  constant  autoregressive  (AR)  process  with  fluctuating  amplitude.  The  signal  AR  parameters, 
white  noise  variance,  and  parameters  of  the  amplitude  envelope  function  constitute  a  complete  set 
of  model  parameters.  We  present  a  very  efficient  means  of  computing  the  maximum  likelihood 
(ML)  estimates  for  these  parameters  based  on  filtering.  We  also  report  on  the  performance  of  the 
models  using  simulated  data  and  show  how  the  model  can  be  used  in  a  class-specific  classifier. 

1.1  Previous  Work 

The  problem  that  occurs  when  white  noise  is  added  to  an  AR  process,  thereby  forming  an  effective 
ARMA  process,  is  well  known  [1]  and  numerous  methods  exist  for  determining  the  underlying 
AR  parameters  [2],  [3].  Previous  work  did  not  consider  varying  AR  process  amplitude.  As  the 
AR  process  varies  in  level,  it  naturally  changes  the  ARMA  parameters  which  further  complicates 
the  problem.  But  at  the  same  time,  it  opens  up  an  opportunity  for  great  simplification  since  the 
underlying  parameters  are  compact,  involving  only  the  AR  parameters,  the  white  noise  variance, 
plus  the  parameters  of  the  changing  AR  process  amplitude. 

1.2  Paper  Summary 

1.  In  section  2,  we  present  the  mathematical  model  for  an  autoregressive  process  modulated  by 
an  envelope  function  in  additive  white  Gaussian  noise. 

2.  In  section  3,  we  talk  about  how  the  model  may  be  used  in  a  classifier. 

3.  In  section  4,  we  talk  in  general  about  how  to  estimate  the  model  parameters. 

4.  In  section  5,  we  present  the  exact  likelihood  function,  which  is  not  practical  to  use  but  serves 
as  a  standard. 
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5.  In  section  6,  we  present  a  simplifying  assumption  that  the  envelope  function  varies  slowly 
so  that  if  the  data  is  segmented,  the  data  in  a  segment  has  a  fixed  spectral  model.  Then, 
we  analyze  in  detail  the  PDF  of  a  segment  of  data.  We  find  the  segment  power  spectrum 
and  autocorrelation  function  (section  6.1),  derive  the  exact  PDF  of  a  segment  (section  6.2), 
develop  an  equivalent  autoregressive  moving  average  (ARMA)  model  for  the  segment  (section 
6.3),  find  the  exact  derivatives  and  Fisher  information  matrix  (FIM)  of  the  PDF  (sections  6.4, 

6.5) .  We  then  present  a  frequency-domain  (FD)  approximation  to  the  segment  PDF  (section 

6.6) ,  obtain  the  derivatives  and  FIM  (sections  6.7,  6.8).  We  use  the  FD  approximation  to 
derive  a  filtering  approach  to  computing  the  segment  PDF  (sections  6.9,  6.10,  6.11),  as  well 
as  the  derivatives  (section  6.12). 

6.  In  section  7,  we  show  how  the  segment  PDF  results  can  be  combined  to  obtain  the  PDF  and 
derivatives  for  the  entire  data  record. 

7.  In  section  8,  we  summarize  the  algorithhm. 

8.  In  section  9,  we  provide  simulation  results. 

2  Problem  Formulation 

In  this  section  we  formulate  mathematically  an  autoregressive  (AR)  process  with  slowly-varying 
power  in  WGN. 

2.1  Data  model 

Consider  an  AR  signal  process  yt,  of  order  P: 

p 

yt  = aiVt-i  +  et,  (1) 

i=l 

where  et  is  a  zero- mean  independent  Gaussian  innovation  process  with  fixed  variance  a‘^.  Let  the 
data  be  modulated  by  a  time- varying  signal  power  function  Xt  and  corrupted  hy  ut,  a  zero- mean 


3 


independent  Gaussian  additive  noise  process  with  variance 

xt  =  {(f))  yt  +  ut,  t  =  1,2...  N,  (2) 

where  cf)  are  the  parameters  of  the  envelope  function  (assumed  to  be  of  dimension  Q).  Note 
that  both  (T^  and  Xt{(f>)  affect  the  power  of  the  process  at  time  t.  In  order  to  prevent  redundant 
parameters,  we  assume  that  (f)  is  normalized  as  follows: 

N 

T,  =  1-  (3) 

t=l 

The  complete  {P  +  Q  +  2)-dimensional  set  of  model  parameters  are 

®  =  km  «2  •  •  •  ap,  0]- 

3  Classifier  Methodology 

Let  X  =  [xi,X2  . .  .xn]'  be  an  observation  vector  of  N  samples  from  xt.  The  goal  of  the  classifier 
is  to  find  the  MAP  classification  hypothesis  for  explaining  x  as  arising  from  one  of  several  class 
hypotheses,  Hi. 

Hmap  =  argmaxp(iLj|x)  =  argmax  p(x|iLj)  p{Hi) 

(Usually,  we  also  assume  that  the  apriori  probablility  of  each  classification  is  equal,  so  that  the 
hypothesis  is  also  a  ML  hypothesis.  Hml  =  argmax  p(x|LIj).) 

3.1  The  Class  Specific  Classifier 

The  class-specihc  classifier  is  fundamentally  a  Bayesian  classifier  that  produces  a  maximum  a  pos¬ 
teriori  classification  hypothesis  given  the  observation  of  the  raw  data. 

Since  x  is  of  very  high  dimension,  a  closed- form  description  of  p(x|Lfj)  is  impractical.  The  clas¬ 
sical  classification  approach  is  to  choose  a  set  of  features  z  =  r(x)  such  that  z  is  an  approximately 
sufficient  statistic  for  distinguishing  between  any  pair  of  classification  hypotheses,  empirically  es¬ 
timate  p{x\Hi)  from  training  data,  and  output  the  MAP  hypothesis  given  z.  As  the  number  of 
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classification  hypotheses,  M,  increases,  the  dimension  of  z  must  also  increase  to  maintain  suffi¬ 
ciency.  But  as  the  dimension  of  z  increases,  the  accuracy  of  the  estimated  p{z\Hi)  decreases.  This 
tradeoff  is  the  curse  of  dimensionality  inherent  to  this  classification  method. 

A  class  specific  classifier  solves  the  problem  of  the  high  dimensional  x  in  a  way  that  avoids  the 
curse  of  dimensionality  w.r.t.  increasing  M  -  by  projecting  the  estimated  p{z\Hi)  into  the  raw  data 
space,  using  a  class-dependent  reference  hypothesis, 


p{-x.\Hi)  =  Ji{-x.)  p{z\Hi), 


(4) 


where 


M^)  = 


p{x\Ho^i) 


(5) 


is  called  the  “J-function” .  A  different  Zj  =  Ti(x.)  may  be  tailored  to  each  classification  hypothesis. 
Hi,  and  Zj  only  need  be  an  approximately  sufficient  statistic  for  distinguishing  between  //q,*  and 
Hi.  Thus,  the  dimension  of  Zj  does  not  depend  on  the  number  of  classification  hypotheses. 

In  this  paper,  we  concentrate  on  a  specific  model  with  a  particular  set  of  features.  The  model 
is  intended  to  model  signals  with  a  very  specific  form.  By  deriving  the  class-specific  J-function,  we 
are  able  to  use  this  model  in  a  classifier  encompasing  other  models  and  features. 


3.2  Approach  to  J-function 


A  number  of  strategies  exist  for  implementation  of  Ji(x)  in  (4).  Either  a  fixed  or  floating  reference 
hypotheses  may  be  used  for  i7o,i  [4].  The  ML  method  is  a  subset  of  the  floating  reference  hypothesis 
method  [4]  and  is  prefereble  whenever  a  model  depends  on  a  compact  set  of  parameters  and  is 
suitable  for  ML  exstimation.  The  maximum  likelihood  method,  which  we  use  here  to  implement 
Jj(x),  is  written 


p(x;0) 


(27r)  2  |I(©)|2 

where  0  is  the  ML  estimate  of  0,  1(0)  is  the  Fisher’s  information  matrix  [1]  for  the  ML  estimator 
of  parameter  0,  and  D  is  the  dimension  of  0.  Notice  that  to  implement  (6),  we  will  need  a 
functional  form  of  the  likelihood  function  as  well  as  the  Fisher’s  information  matrix. 
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4  Parameter  Estimation 


Besides  obtaining  an  efficient  functional  form  of  the  likelihood  function,  we  also  need  to  estimate 
the  values  of  the  parameters.  Our  approach  is  to  obtain  initial  parameter  estimates  for  0,  then 
iterate  to  find  the  ML  estimate  using  an  approximate  Newton-Rap hson  iteration  based  on  the 
Fisher’s  Information  matrix. 


The  Newton-Raphson  iteration  requires  not  only  the  FIM,  but  also  the  first  derivatives  of  the 
log-likelihood  function  with  respect  to  the  parameters.  The  first  derivatives  are  more  important 
from  an  accuracy  point  of  view  since  an  inaccurate  FIM  only  slows  down  the  algorithm,  while  an 
inacurate  derivative  gets  you  the  wrong  result. 

Consider  a  general  log  PDF  that  depends  on  D  parameters:  0  =  [61,62  • .  -  60]' ■  The  Fisher’s 
information  between  any  two  parameters  6i  and  6j  is  dehned  by 


hifii  =  -E 


8“^  logp(x;  6) 
d6id6j 


Collecting  all  these  values  into  the  matrix  1(0),  we  have  the  D  x  D  Fisher’s  information  matrix. 
The  Cramer-Rao  lower  bound  states  that  the  covariance  matrix  C  of  any  joint  unbiased  estimator 
for  the  parameters  9  is  such  that 

x'(C-I-^(0))x>O 

for  all  X  7^  0.  This  effectively  means  that  I“^(0)  is  the  lower  bound  for  the  covariance  of  any 
unbiased  estimator. 


The  inverse  of  the  Fisher’s  information  matrix  is  a  good  estimate  of  the  parameter  estimation 
error  covariance  and  is  useful  for  iterative  optimization.  Given  a  parameter  estimate  6n,  the  new 
estimate  is  obtained  as 

9n+l=9n  +  ^  S,  (8) 


where 

5  =  [d(0i)  d{62) . . .]' 
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is  the  gradient  vector  formed  from  the  first  partial  derivatives 

d{0i)  =  ^logp(x;0) 

5  The  Exact  Likelihood  Function. 

To  implement  the  numerator  of  (6),  we  need  p(x;  ©),  the  formula  for  the  data  PDF  as  a  function 
of  the  parameters,  or  likelihood  function.  Let  be  the  covariance  matrix  of  process  yt-  Thus, 

is  the  covariance  matrix  of  the  AR  process  yt  when  =  1.  The  covariance  of  a  stationary 
AR  process  is  symmetric  and  Toeplitz  and  can  easily  be  derived  from  the  inverse  Fourier  transform 
of  the  theoretical  AR  power  spectrum  [1].  Let  the  matrix  A  be  a  diagonal  matrix  with  elements 
Kt^t  =  At,  1  <  t  <  N .  Let  u  be  a  zero- mean  Gaussian  random  vector  with  variance  In  matrix 
form,  (2)  becomes 

X  =  u, 

and  the  covariance  of  x  is 

C  =  E{xx'}  =  fj^I  +  A^/2  ^1/2^ 

Now  we  have  the  closed  form  for  the  PDF  of  x  parameterized  by  C,  p(x;  C). 

logp(x;  0)  =  -y  log(27r)  -  ^  log  |  det  C|  -  x.  (10) 

Equation  (10)  is  exact,  but  its  usefulness  is  limited  because  the  size  of  C  is  A  x  A.  In  many 
real-world  problems,  A  is  too  large  to  evaluate  (10)  efficiently.  Nevertheless,  (10)  does  serve  an 
important  role  in  evaluating  the  accuracy  of  approximate  methods. 

6  Piecewise  Stationary  Model 

To  arrive  at  an  efficient  implementation  of  the  PDF,  we  assume  that  Xt  varies  slowly  with  time,  and 
so  may  be  approximated  by  a  constant,  Lm{4>)^  over  an  interval  of  M  samples,  where  1  <  M  <<  A. 
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Thus 


At  =  Lm,  1  +  (m  —  1)M  <  t  <  mM,  (11) 

where  m  is  the  segment  number  and  we  have  dropped  the  argument  (<^)  for  notation  simplicity. 
This  assumption  means  there  is  a  fixed  spectral  model  in  effect  in  the  segment.  This  leads  to 
a  simplified  model  for  the  PDF  in  a  segment.  Note  that  we  do  not  assume  the  segments  are 
independent  when  we  compute  the  PDF  of  the  entire  data  record.  We  only  recognize  that  the 
model  is  constant  within  a  segment. 

We  focus  now  on  the  PDF  for  a  single  data  segment  and  will  later  extend  the  result  to  the 
full  data  record  x.  Let 

Xm  =  [x .  .  .  XMm] 

be  the  segment  m  data.  In  this  section,  we  concentrate  now  on  the  PDF  of  the  segment  m  data, 
for  which  the  parameters  are  defined  as 

*^n‘>  ^1’  *  *  ‘  (f^) 

where  we  combine  the  parameters  Lm  and  into  a  single  parameter 

(13) 


6.1  Segment  Power  Spectrum  and  Autocorrelation  Function 


The  power  spectrum  of  the  data  within  a  segment  can  be  written  as 


iV  _  ^  I 

rk,m  ' 


O’:: 


(14) 


where  we  have  used  (2),  (11),  (13),  and  where  is  shorthand  for  which  is  the  Z- 

transform  of  the  AR  polynomial  a  =  [1,  oi, . . .  ap]  evaluated  on  the  unit  circle. 

Since  it  is  zero-mean,  its  covariance  matrix  is  equal  to  its  autocorrelation  matrix.  The  auto¬ 
correlation  function  (ACF)  is  the  inverse  Fourier  transform  of  the  power  spectrum.  Since  it  is  not 
practical  to  use  the  continuous  frequency  values  we  must  find  a  practical  means  of  computing  the 


ACF  using  the  DFT.  Let  rt^m  be  the  ACF  lag  t  in  segment  m.  For  any  stable  stationary  Gaussian 
process,  the  ACF  decays  to  zero.  Assuming  rt^m  dies  to  zero  for  t  <  K  where  K  <  M,  we  can 
calculate  rt^m  from  using  a  length  2M  inverse  DFT.  We  first  take  the  length  2M  DFT  of  a 
(zero  padded  to  length  2M),  compute  using  (14),  then  take  the  inverse  DFT.  The  result  is 
valid  up  to  lag  M.  In  MATLAB, 

A=fft([a(:);  zeros(2*M-P-l, 1)] ) ; 
rho  =  sig2n  +  sig2m. /abs(A) . *2; 
rm  =  real (if ft (rho) ) ; 
rni=rni(l  :M)  ; 


6.2  Segment  PDF 

Modifying  (10)  for  a  single  segment  m, 

logp(x„;  0)  =  -y  log(27r)  -  ^  log  |  det  Cm\  -  x^,  (15) 

where  Cm  is  the  M  x  M  covariance 

I  2  TD  Af 

Since  Cm  is  symmetric  and  Toeplitz,  we  may  use  the  efficient  Levinson  algorithm  to  compute  (15), 
which  provides  the  determinant  of  Cm  as  a  by-product,  however  M  may  still  be  too  large  to  be 
practical. 


6.3  ARMA  model  using  Spectral  Factorization 


Although  (14)  is  a  compact  spectral  model,  it  is  not  in  the  most  useful  form.  If  we  re-write  it 
in  a  rational  form,  we  may  implement  the  PDF  using  linear  filtering.  By  assuming  the  process  is 
quasi-stationary  within  segments  of  length  M,  we  may  represent  the  power  spectrum  in  a  rational 
form  equivalent  to  an  ARMA  process.  We  may  re-write  (14)  as 


N  ^  +  ^  2 

Pk^m  1^4^  1^  ^b,7n 


I  |2 

I 

WF’ 


(16) 
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which  is  the  PSD  of  an  ARMA(P,  P)  process.  By  equating  the  numerators  in  (16)  we  see  that  the 
z-transformed  AR  and  MA  parameters,  A(z)  and  respectively,  are  related  by 

al^  Br^{z)B*^{l/z*)  =al  +  alA{z)A* {1/ z*).  (17) 

To  obtain  the  equivalent  ARMA  parameterization,  we  need  to  find  and  =  [1,  &m,i  •  •  • 
the  filter  coefficients  corresponding  to  Bm{z).  Using  (17),  we  can  solve  easily  for  the  coefficients  of 
the  order  2P  +  1  polynomial  in  z  corresponding  to  Bm{z)B^{l/z*),  which  we  denote  by 
In  MATLAB,  equation  (17)  is  realized  in  the  time  domain  as: 


beta_ni  =  sig2n*conv(a( : ) ,  f  lipud(a( : ) ) )  ; 
beta_ni(P+l)  =  beta_ni(P+l)  +  sig2ni; 


Using  the  technique  of  spectral  factorization  [5],  we  observe  that  (3^^  is  proportional  to  the  convo¬ 
lution  of  bm  and  b))^,  which  is  b^  reversed  in  time  so  it  has  the  combined  roots  of  b^  and  bj)^. 
The  roots  of  b))^  are  the  reciprocal  of  the  roots  of  b^-  Thus,  we  use  the  following  procedure:  find 
the  roots  of  f3^  and  divide  them  into  reciprocal  pairs.  Take  the  root  with  magnitude  less  than  1 
from  each  pair  and  assign  it  to  b^.  Then  form  polynomial  b^  from  the  roots. 

To  find  the  scale  factor  we  may  equate  the  coefficients  of  the  zero-th  power  of  z  for  both 
sides  of  (17),  resulting  in 

p  p 

2  _  2  I  2  2 

^b,m  /  .  ^m,i  ) 

2=0 


or 


^  _ 

^b,m 


crt 


/,2 

Thus,  we  have  an  efficient  method  for  obtaining  the  ARMA  filter  parameters 


—  ^m,li  bm,2  ■  ■  ■  bm,P}  Q'2  •  •  •  0,p\i 


which  are  an  equivalent  set  of  parameters  to  0m,  although  they  are  overparameterized. 
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6.4  Exact  Derivative  Analysis  of  Segment  PDF 


We  now  find  the  exact  derivatives  of  (15)  with  respect  to  a  arbitrary  parameter  Q.  Using  standard 
results  for  matrix  derivatives,  we  have 

_  _itracelC”iD''  1  +  -x'  C’^x 

QQ  ~  ^m)  ^  ^rn^m 

where  is  the  M-hy-M  matrix  of  derivatives  of  the  elements  of  Cm  with  respect  to  6.  Since  Cm 
is  a  symmetric  Toeplitz  matrix  formed  from  the  ACF  sequence, 

Cm  =  Toeplitz  (r^), 

is  a  symmetric  Toeplitz  matrix  formed  from  the  derivatives  of  the  ACF  sequence: 

Dm  =  Toeplitz  (r^). 


where 


To,m) 

Finally,  since  the  ACF  is  the  inverse  FFT  of  the  power  spectrum, 

Tm  =  lFFT{p^^m,Pl,m  •  •  •  PM-l,m), 

we  have 

Ym  ir  r  1  {pQ^fyi-,  P\^rn  *  *  *  PM—l^m)'> 

where  is  the  derivative  of  in  (14)  with  respect  to  scalar  parameter  9.  These  derivatives 
are  given  later  in  equations  (23)  through  (25). 

To  summarize,  the  derivatives  of  (15)  with  respect  to  a  parameter  0  can  be  computed  in  MAT- 
LAB  as 


Cni=toeplitz(rm(l  :M) ) ; 

Cmi=inv(Cni)  ; 

d  =  real (if ft (rho_theta) ) ; 

D=toeplitz(d(l :M) ) ; 

lpxm_theta  =  -.5*trace(  Cmi  *  D  )  +  .5  *  x’ *Cmi*D*Cmi*x; 
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where  rm  is  the  M-by-1  ACF  vector  and  rho_theta  is  2M-by-l  vector  of  derivatives  ■  This 
is  accurate  as  long  as  the  ACF  corresponding  to  the  power  spectrum  dies  to  zero  at  a  lag  less  than 
M. 

This  approach  is  computationally  of  order  where  M  may  be  quite  large.  But,  since  Cm  and 
are  symmetric  and  Toeplitz,  an  order  approach  exists  that  employs  the  Levinson  algorithm. 
This  may  still  be  prohibitive,  so  we  seek  an  order  M  algorithm  based  on  filtering. 

6.5  Segment  PDF  Fisher  Information  Analysis  -  Exact 

The  exact  second  derivatives  of  (15)  may  also  be  obtained  and  the  Fisher’s  information  computed. 
Let  9i,  02  be  two  arbitrary  spectral  parameters.  Then 

"  itrace(C-iD^iC-iD'^2)-x(,C-iD'^iC-iD'^2C-ix„ 

T  ra  ^  logp(x^)  | 

U01,02)  -  j 

=  -itrace(C^iD®iC-iD®2)  + 

=  itrace(C-iD^iC-iD'^2) 

The  terms  and  can  also  be  obtained  efficiently  using  the  Levin¬ 

son  algorithm.  Despite  this,  the  use  of  the  above  equation  is  primarily  for  validation  since  it  is 
computationally  expensive  and  not  useful  for  combining  segments. 

6.6  Frequency  Domain  Segment  PDF 

In  the  frequency  domain,  it  is  easier  to  analyze  how  the  PDF  of  x^  depends  on  its  parameters,  0. 
Let  A:  =  0, 1 ...  M  —  1  be  the  DFT  of  x^, 

M 

=  E  (18) 

t=i 
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The  frequency-domain  (FD)  approximation  to  (15)  is  given  by  the  log-PDF 

1  (  \X^  P  1 

logp(x„;p^^  .  ..pM-i,m)  =  “o  H  \  ■  (19) 

^  k=0  I  ^^^Pk,m) 

It  may  be  verified  that: 

1.  (19)  is  an  approximation  to  (15). 

2.  Although  written  explicitly  in  terms  of  the  DFT  coefficients  it  is  the  PDF  of  a  real 

multivariate  Gaussian  density  on  x^.  That  is,  if  it  is  re-written  in  terms  of  x^  by  substituting 
the  DFT  formula  (18),  it  may  be  put  into  the  same  form  as  (15),  however  the  covariance  matrix 
is  not  Toeplitz. 

3.  It  is  an  exact  PDF,  that  is,  it  integrates  identically  to  1  on  x^. 

4.  The  DFT  bins  are  independent  complex  Gaussian  RVs.  A  data  sample  x^  drawn  from  this 
PDF  has  independent  complex  Gaussian  DFT  bins  whose  expected  magnitude-squared  is 

£{\Xt!j‘'}  =  0<k<M.  (20) 

PDF  (19)  represents  a  spectrally  non-white  Gaussian  process  that  has  independent  DFT  bins.  It 
is  well  known  that  stationary  Gaussian  processes  have  DFT  coefficients  that  are  asymptotically 
independent  as  the  size  of  the  data  record  goes  to  infinity,  but  only  truly  independent  if  spectrally 
white  [6].  However,  because  (19)  is  defined  in  terms  of  the  magnitude-squared  DFT  bins,  it  is  not 
a  stationary  process.  However,  it  is  a  circularly  stationary  process. 

6.7  Derivative  Analysis  of  Segment  PDF  -  Frequency  domain 

Let  the  first  derivatives  of  (19)  be  denoted  by 

d 

d{9)  =  ^logp(x„;0), 

where  6  is  some  arbitrary  parameter  upon  which  p^^  depends.  Although  we  will  not  use  the  first 
derivatives  of  (19)  themselves,  their  forms  will  help  us  find  efficient  means  of  finding  the  derivatives 
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of  (15).  We  have 


&)  =  m  = 


_ 1  '^M—1 

2  2^k=Q 


where 


From  (14),  we  have 


Tmik)  = 


■ik=0 

_  _ 1  1 

~  ~2  ^k=0 


dpi^ 

ae 


dpr,r, 

ae 


Pk,rr, 


[Xf  |2 


(21) 


Tmik), 


1 

rk,m 


M\2 
k  \ 


dp^ 

dal 


Mip^,mr 


=  1 


dp^ 


k,m 


dp] 


da 


d<rl 

M  (  ^2  aM  .j27zki/M\ 

k,m  _  _2J^g  J  ym^fc  ^  f 


leading  to 


M-l 


dmiXm;crl)  =  --  J2 


l<i<P, 


I  yM  |2 
\^k,m\ 


k=0 


Pt^  Mip^, 


M  )2  [  ’ 

m)  ) 


.Mir  . 
dmiXm;(Tm)  =  -T; 


1^- 


M  |2 
k,m  I 


M 


fcO  k«>  “(ft" 


^k.ra' 


Kl- 


M-l 


dmi^m,(li)  —  ^  ) 


1 


I  |2 
I 


lPk,m  M(p£j^J  l^ri^  ^ 

We  may  obtain  some  simplification  and  intuitive  understanding  if  we  employ  the 
ARM  A  parameterization.  If  we  use  (16)  and  define 


Wt!m  = 


yM  A  M 
^k,m^k 


^k,my^b,m 


M 


IF, 

yM  _  k,m 
^k,m  tdM  ’ 
^k,ra 


and 


= 

k,m 


yrM  aM 


(22) 

(23) 

(24) 

(25) 

(26) 

(27) 

(28) 
equivalent 

(29) 

(30) 

(31) 


k^mV  ^b,m 


14 


equations  (26)  through  (28)  may  be  simplified  to 


M-l 


O'n)  ~  r. 


2  ^  I 

k=0  I  Pk,rr 


M-l 


niM  |2- 

M 


dmi^m'jO'rn)  ~  9 


dmi^rm  di)  — 


a, 


2  M-l 


at 


E 


g— j27rifc/M 

?M  12 
I  , 


fc=0 

1 


A'JA«\-‘  Ma 


At,  A 

2 

b,m 


M- 


'b,m  k=0  I  J  fc=0 

These  alternative  forms  will  help  us  in  section  6.12. 


1  f  'V^  p—j27TiklM 

E-r»  I  ^ k.m^ k.m^  I 

— K- — I 


(32) 

(33) 

l<i<P,  (34) 


6.8  Segment  PDF  Fisher  Information  -  Frequency  domain 

Using  (21),  the  Fisher’s  information  between  any  two  spectral  parameters  9i  and  02  equals 


U01,02)  =  -£  I  ^  logp(x„;  0) }  =  I E  J  Tm{k) 


Before  carrying  out  the  derivative  with  respect  to  62,  notice  that  Tmik)  is  zero  in  expected  value. 
Therefore,  the  only  terms  remaining  are  associated  with  the  derivative  of  Tm{k).  Note  that 


k,m' 


dp^ 


k,m 


+  2 


dpi 


802 


dp’^ 


k,m 


Therefore, 


fc=0 


'  dp^ 


k^m 


90^  J  ip¥,J^  \  902 


(35) 


Using  (23)  through  (25),  in  (21)  and  (35),  we  obtain  the  Fisher  information  for  the  parameters 
a‘^,a‘^ ,  ai,  a2  ■  ■  ■  ap  for  the  segment  m.  We  later  combine  them  to  obtain  the  FIM  for  the  entire 


data  record. 
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6.9  Segment  PDF  computation  by  Filtering 

Efficient  evaluation  of  (15)  may  be  accomplished  by  filtering.  We  may  write 

logp{Xm;  0)  =  log(27r)  -  ^  log  I  det  Cm\  - 


where 


and  Hm  is  the  Cholesky  decomposition  of  Cm'- 


Cm  = 


(36) 


(37) 


and  if  Cm  has  a  symmetric  Toeplitz  form  (constant  along  every  diagonal  and  consistent  with  any 
stationary  process),  then  Hm  is  a  whitening  matrix  that  results  in  the  covariance  of  Wm  being  the 
identity  matrix.  Thus,  we  evaluate  p(xm;  0)  by  whitening  Xm,  hnding  the  total  power,  w(„Wm,  and 
compensating  for  the  determinant  of  the  whitening  matrix,  |  det  Hm  |  • 

We  seek  an  efficient  hlter  implementation  that  approximates  Hm-  From  systems  theory,  we 
know  that  Hm  has  a  linear  shift  invariant  hltering  equivalent.  This  hlter  is  based  on  the  power 
spectrum  (16)  which  suggests  an  ARMA  whitening  hlter  with  Z-transform 


Hm{^)  — 


A(z) 


a? 

b,m 


(38) 


If  we  begin  hltering  Xm  with  this  hlter,  there  will  be  a  startup  transient  since  the  proper  initial 
conditions  are  unknown.  Once  the  transient  has  died  out,  samples  at  the  hlter  output  will  be 
uncorrelated  and  therefore  independent.  We  can  calculate  the  initial  samples  of  Wm  exactly,  then 
switch  to  the  hlter  output  after  the  startup  transient. 


6.10  Determining  length  of  startup  transient 

There  are  many  ways  to  measure  the  length  of  the  startup  transient,  which  is  the  impulse  response 
of  whitening  hlter  Hm{z)  =  A{z)/Bm{z).  A  method  we  have  found  efficient  and  useful  is  to  use  the 
FD  method  to  solve  for  the  autocorrelation  function  (ACF)  corresponding  to  the  theoretical  power 
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spectrum  of  the  inverse  process  pk,m  =  \-^k\^ /\Bk,m\^ ■  To  allow  for  an  ACF  up  to  length  M,  we 
zero-pad  the  polynomials  a  and  up  to  length  2M,  then  take  the  magnitude-square  of  the  the 
length-2M  DFT,  followed  by  an  inverse  DFT.  The  result  is  a  length-2M  inverse  ACF  estimate.  In 
MATLAB, 

A=fft([a(:);  zeros (2*M-P-1 , 1)] ) ; 

B=fft([b(:);  zeros (2*M-P-1 , 1)] ) ; 

A2=msq(A) ; 

B2=msq(B) ; 

ri  =  real (if ft (A2 . /B2) ) ; 

Let  T  be  the  length  of  the  ACF  measured  as  the  index  of  the  last  lag  with  amplitude  larger  than 
ri(l)/500: 

K  =  max(f ind(abs (ri (1 :M) )  >  ri(l)/500)); 

K=max(K,P) ; 

Note  that  we  force  K  to  be  at  least  as  large  as  P,  the  filter  order.  An  example  of  determining  filter 
startup  transient  is  shown  in  Figure  1.  The  MATLAB  code  segment  below  was  used  to  produce  it. 

r  =  real (if ft (B2 . /A2) ) ; 

R=toeplitz(r(l :M) ) ; 

Ci=inv(chol(R) ) ’ ; 
wl  =  Ci  *  x(l :M) ; 

%  determine  whitened  samples  by  block  of  M  samples 
w=f ilter(a,b,x) ; 
plot (w-wl) ; 
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Figure  1:  Example  of  filter  startup  transient.  Top  graph:  difference  between  whitened  samples 
computed  in  a  block  with  samples  computed  by  filtering.  Lower  graph:  ACF  of  the  inverse  power 
spectrum.  Notice  that  after  about  10  samples,  there  is  very  close  agreement,  which  agrees  with  the 
lower  plot  showing  the  decay  of  ACF  corresponding  to  the  inverse  spectrum. 
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6.11  Extension  of  filter  beyond  K  samples. 

Let  the  length  of  startup  transient  be  equal  to  K  samples  where  K  <  M.  Let  us  concern  ourselves 
only  with  the  first  n  samples  of  x,  where  K  <  n  <  M.  Modifying  (36), 

n  1  1  ” 

logp(xi,X2...Xn;0)  = --log(27r)  - -log|detC”  I  - wl^,  (39) 

2=1 

where  is  the  n  x  n  initial  sub-block  of  Cm-  Now  we  ask  how  does  this  equation  change  as 
we  add  one  more  sample,  x^+i?  Since  the  filter  startup  transient  has  died  off,  the  values  of  Wi^m 
obtained  from  (37)  will  be  the  same  as  values  of  Wn  obtained  by  filtering.  Therefore, 

logp{xm;0)\m=n+i  =  logp{xi,  X2  -  ■  ■  Xn,  0)  -  ^  log(27r)  -  ^  log  |  det  C”+VdetC”  |, 

however,  the  ratio  detC^+^/ det  C!^  converges  rapidly  to  as  n  >  K.  Thus,  we  have 

logp(xi,  a;2  .  .  .  Xn+i;  0)  =  logp{xi,X2  ...Xn;0)-^  log{27ralm)  -  (40) 

Combining  (40),  (39),  and  extending  out  to  sample  M, 

K  1  ll 

log p{xm-,  0)  =  -  —  log(27r)  -  -  log  I  det  C^|  -  wlm  -  ^  <^  -  log(27ru^^^)  +  -wlm  \  ,  (41) 

where  Wi^m  is  obtained  from  (37)  for  i  <  K  and  from  filtering  for  i  >  K. 

6.12  Derivative  Analysis  of  Segment  PDF  -  filtering  approach 

The  results  of  section  6.4  are  still  a  little  bit  cumbersome  to  implement.  Using  filtering,  we  may 
find  a  much  more  efhcient  approach  to  finding  the  derivatives  of  PDF  (15),  then  extend  it  across 
segment  boundaries. 

The  filtering  approach  to  finding  the  derivatives  is  directly  analogous  to  section  (6.11)  where  the 
segment  likelihood  function  is  formed.  We  begin  with  the  exact  derivatives  for  the  first  K  samples, 
then  add  the  filtering  results  to  obtain  the  derivatives  of  the  entire  segment. 

Results  from  the  FD  analysis  can  be  used  as  a  guide.  The  time-domain  equivalents  of  equations 
(32)  through  (34)  suggest  the  filtering  approach  to  obtain  contributions  of  sampes  iF  -|-  1  to  M.  In 
MATLAB  notation,  let 
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w=f ilter(a,b,x)/sqrt(sig2b) ; 
u=f ilter(a,b,w)/sqrt(sig2b) ; 
v=f ilter (1 ,b,w) ; 
va=filter(l,a,v) ; 

Then  to  compute  the  contributions  of  samples  K  -\-\  to  M,  equation  (32)  becomes 
Dsig2n  =  - . 5*suni(l . /rho) * (M-K) /M  +  .5  *  sum(u(K+l : M)  T2) ; 

Equation  (33)  becomes 

Dsig2ni  =  - .  5*suni(l . /rho . /A2) * (M-K) /M  +  .  5/sig2b*suni(v(K+l  :M)  .  "'2) ; 

Equation  (34)  becomes 

dal  =  real (if ft (1 . /conj (A) . /B2) ) ; 
dal=dal(2:P+l)  *  (M-K) ; 
for  i=l:P, 

Da(i)  =  sig2./sig2b  *  (dal(i)  -  sum(v(K+l :M)  .*  va(K+l-i :M-i) ) ) ; 
end; 

7  Combining  Segments  for  Entire  data  Set 

We  have  extensively  analyzed  the  segment  PDF  p(xm;0m)  given  in  (15).  We  have  the  derivatives 
dmi^m]  &)  for  each  of  the  parameters  in  6m  as  well  as  ED  approximations  to  the  Fisher  information 
matrix  lm{6m)-  We  would  now  like  to  combine  the  results  to  obtain  the  complete  data  PDF  p(x;  0) 
given  in  (10)  as  well  as  the  associated  derivatives  and  FIM. 

7.1  Combining  Segment  Likelihood  Functions 

We  cannot  simply  add  the  segment  PDFs  to  obtain  the  full  data  PDF  because  this  would  imply 
that  the  data  segments  are  independent,  and  they  are  not.  To  obtain  the  full  data  PDF,  we  need 
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to  extend  the  process  of  whitening  which  we  employed  within  a  segment  in  section  6.11.  Note 
that  equation  (41)  assumes  the  data  spectrum  (and  therefore  the  ARMA  whitening  filter)  remains 
constant,  so  it  is  not  valid  for  samples  greater  than  M.  It  is  natural  to  ask  what  happens  as  we 
continue  filtering  when  we  cross  over  the  boundary  to  a  new  segment?  The  only  obstacle  is  the 
fact  that  the  whitening  filter  changes  at  each  segment  boundary.  If  we  have  made  M  small  enough 
that  the  filter  coefficients  change  only  slightly,  we  can  have  approximate  segment-to-segment  filter 
continuity  by  using  the  filter  state  variables  for  segment  m  —  1  as  the  initial  filter  conditions  for 

segment  m,  only  changing  the  filter  coefficients.  We  have,  the  main  result, 

K  M 

logp(x;  G)  =  -f  log(27r)  -  ^  log  |  det  Cf  |  -  wl^  +  ^  log(27rcr^^^)  + 

t=l  t=K+l 

(42) 

N/M 

+  {f 

m=2 

where  Wi^rn  is  obtained  from  (37)  for  i  <  K  and  from  filtering  for  i  >  K.  The  hlter  coefficients  are 
re-calculated  at  each  segment  boundary  and  filter  continuity  is  maintanted  by  retaining  the  filter 
initial  conditions  as  the  boundary  is  crossed. 

7.2  Derivatives  of  the  Full  data  PDF 

One  obstacle  to  overcome  in  extending  the  results  of  the  segment  PDF  to  the  full  PDF  is  the 
different  parameterizations.  In  the  full  PDF,  the  power  of  the  AR  process  is  a  scalar  variance 
multiplied  by  a  time- varying  envelope  function  But,  in  the  segment  PDF  the  signal  power 

equals  =  LmCr‘^  where  Lm  is  the  value  of  the  piecewise  constant  function  in  the  segment. 

We  take  a  two-step  approach.  First  we  expand  the  parameter  set  to  include  the  segment  signal 
powers.  Let  the  expanded  parameters  be  denoted  by 

=  [<T^,  ai,  02  . . .  ap,  af  . . . 

We  then  calculate  the  derivatives  of  logp(x;0^)  with  respect  to  each  parameter.  Finally,  we 
transform  the  results  to  obtain  the  derivatives  with  respect  to  parameters  0. 
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In  the  first  step,  we  take  a  leap  of  faith  to  arrive  at  an  excellent  and  efficient  approximation 
that  can  be  validated  later  numerically  and  by  comparing  to  the  exact  results.  Essentially,  we  take 
the  analogous  approach  to  calculating  the  full  data  PDF  by  extending  the  filter  approach  in  section 
6.12  within  a  segment  to  multiple  segments.  Since  we  have  derived  a  filtering  implementation  of  the 
derivative  calculation,  we  continue  across  segment  boundaries  in  the  same  fashion  -  changing  the 
hlter  coefficients  to  reflect  the  changed  signal  power  yet  maintaining  filter  continuity  by  retaining 
the  filter  initial  conditions  from  the  previous  segment. 

In  the  second  step,  we  transform  the  results.  Because  Lm  is  a  function  of  0,  we  can  write 

O-m  =  ■  ■  ■  (Pq), 

where  Q  is  the  dimension  of  0.  So,  if  we  write  logp(x;  0)  by  substituting  (T^Lm(0)  for  into 
logp(x;  0'),  the  chain  rule  of  derivatives  gives 

aiogp(x;0)_^  ^  /^aiogp(x;0')^ 

dfj2  2^  ^rn[(p)  I  g  2  I  ’ 

m=l  \  m  ) 

51ogp(x;0)  ^  2  roi;  /^51ogp(x;0O\ 

i  a-i  )■ 

where 

A  d  log  Lm{(t)) 

^  ^4>^  ■ 

The  remaining  parameter,  ai,a2  ■  ■  ■  ap  are  identical  so,  for  example 

d  logp(x;  0)  d  logp(x;  0') 
dai  dai 
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All  of  this  can  be  written  in  the  matrix  form 


91ogp(x;0) 

91ogp(x;0^) 

91ogp(x;0) 

01 

91ogp(x;0^) 

91ogp(x;0) 

9^n/m 

d<j>Q 

=  F 

91ogp(x;0^) 

91ogp(x;0) 

o-n 

o-n 

91ogp(x;0^) 

91ogp(x;0) 

ai 

ai 

91ogp(x;0^) 

Ologp(x;0) 

ap 

dp 

where  F  is  the  (P  +  Q  +  2)  x  (P  +  1  +  N/M)  matrix 


Li 

p2 

•  •  I^N/M 

0 

0 

0 

0 

0 

0 

to 

"  ^N/M 

0 

0 

0 

0 

0 

••  0 

1 

0 

0 

0 

0 

••  0 

0 

I 

0 

0 

0 

••  0 

0 

0 

••  0 

0 

0 

I 

7.3  Fisher  Information  of  the  Full  data  PDF 

As  with  the  derivatives,  we  make  two  steps:  we  obtain  the  FIM  for  the  full  data  PDF  in  the 
segmented  parameterization  0^,  then  convert  the  results.  Since  accuracy  of  the  FIM  is  not  as 
critical  as  the  accuracy  of  the  derivatives,  and  because  of  the  computational  load  of  the  FIM 
calculation,  we  settle  for  the  FD  approximations  in  section  6.8  simply  added  up  over  the  segments. 
The  implicit  assumption  here  is  that  the  segments  are  statistically  independent,  which  they  are  not. 
This  dependence  is  only  an  edge  effect,  however.  Following  the  development  above  for  derivatives, 
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we  write  that 


l(x;0)  =FI(x;0')  F'. 


8  Algorithm  Summary 

The  algorithm  proceeds  as  follows. 

1.  Obtain  initial  parameter  estimates  0.  Compute  Lm,  1  <  m  <  N/M  according  to  the  envelope 
model. 

2.  Determine  the  log-PDF,  derivatives  and  FIM  for  the  entire  data  record  in  terms  of  the  ex¬ 
tended  parameter  set  0h 

(a)  Zero  the  accumulators  for  the  data  log-likelihood  function,  all  derivatives,  and  FIM. 
Segment  number  is  set  to  m  =  1. 

(b)  The  segment  power  spectrum  is  computed  for  segment  m  (section  6.1). 

(c)  Using  the  results  of  section  6.3,  the  ARMA  filter  parameters  for  segment  m  are 
obtained. 

(d)  If  this  is  the  first  segment,  the  length  of  the  startup  transient  (K)  is  determined  (section 
6.10).  The  whitened  data  for  the  first  K  samples  is  determined  (37).  Then  the  exact 
log  PDF  value  of  the  first  K  samples  is  determined  (equation  36).  The  segment  data  is 
filtered  by  the  whitening  filter  (38).  To  form  M  samples  of  whitened  data,  samples  1 
through  K  are  taken  from  (37),  while  samples  K -\-l  through  M  are  taken  from  the  filter 
output.  The  log  PDF  value  for  the  M  samples  of  the  segment  is  calculated  combining 
the  exact  PDF  of  the  first  K  samples  with  the  filter  output  (equation  41).  Add  the 
segment  log  PDF  value  to  the  log-PDF  accumulator.  Save  the  filter  initial  conditions. 

(e)  If  this  is  not  the  first  segment,  the  segment  data  is  filtered  by  the  whitening  filter  (38) 
using  the  stored  initial  conditions  from  the  previous  segment.  This  is  used  to  compute 
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the  segment  log-PDF  (see  the  last  term  in  (42).  Add  the  segment  log  PDF  value  to  the 
log-PDF  accumulator.  Save  the  filter  initial  conditions. 

(f)  If  this  is  the  first  segment,  use  section  (6.4)  to  obtain  the  exact  derivatives  for  a  block 
of  the  first  K  samples.  Then,  use  section  (6.12)  to  obtain  the  contribution  of  samples 
K  +  1  through  M  by  filtering.  Save  all  initial  conditions  for  the  various  filters.  Add 
the  segment  contribution  to  the  derivative  accumulators.  Note  that  the  derivative  with 
respect  to  parameter  cj^  gets  a  contribution  only  for  segment  m. 

(g)  Use  section  (6.8)  to  obtain  the  FD  approximation  to  the  segment  FIM.  Add  to  the 
FIM  accumulator.  Note  that  terms  involving  parameter  get  a  contribution  only  for 
segment  m. 

(h)  Repeat  steps  (a)  through  (h)  until  all  segments  have  been  processed.  At  this  point,  we 
have  the  log-PDF  value  for  the  entire  data  record  as  well  as  derivatives  of  the  log-PDF 
for  each  parameter  in  0^ 

3.  Convert  the  derivatives  and  FIM  that  are  in  terms  of  0^  to  the  parameters  set  0  (section 
7.2). 

4.  Update  the  parameter  ©  estimates  using  (8). 

5.  Repeat  steps  2  through  4  until  0  converges.  Monitor  the  log-likelihood  function,  it  should 
increase  at  each  step  or  remain  the  same. 

9  Simulation 

We  conducted  two  experiments.  In  the  first,  a  simple  smaller  experiment  was  used  as  a  means 
of  comparing  the  exact  PDF  (10)  and  numerically  computed  derivatives  with  the  FD  approach 
(sections  6.6  and  6.7)  and  filtering  approach  (sections  6.11  and  6.12).  We  used  a  Gaussian  shaped 
envelope  function; 

Lm  =  (27rU)-^/2 
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Figure  2:  Comparison  of  log-PDF  values  for  24  trials  using  filtering  approximation.  Left  panel; 
approximate  vs.  exact  log-PDF.  Right  panel;  approximation  error  vs.  exact  log-PDF. 

where  /r,  V  are  the  mean  and  variance  of  the  Gaussian  shape.  The  parameters  we  used  are;  M  =  48 
samples,  N/M  =  16  segments,  =  60,  =  5,  V  =  28.44,  fj,  =  8.5,  and  an  order-2  AR  model 

with 

a= [1.0000  -0.9899  0.4900] 

The  envelope  function  A*  was  a  step-wise  constant  function  equal  to  Lm  in  each  segment.  For 
each  of  24  trials,  we  computed  the  exact  log-PDF  according  (9)  and  (10).  Next,  we  computed  the 
log-PDF  according  to  the  algorithm  described  in  section  8  which  uses  the  filtering  approximation 
(equation  42).  In  figure  2  we  compare  the  exact  PDF  values  with  the  approximation.  There  is  very 
close  agreement  with  log-PDF  error  within  -I-/-0.4.  The  same  experiment  was  conducted  using 
the  FD  approximation  to  the  segment  PDF  (section  6.6),  accumulated  over  the  segments.  The 
results  are  shown  in  figure  3  showing  a  bias  of  -4  and  a  variation  of  -I-/-10.  This  clearly  shows  the 
superiority  of  the  filtering  approach  as  compared  with  the  FD  approach.  A  similar  comparison  can 
be  made  for  the  derivatives  of  the  log-PDF.  In  figures  4  and  5,  we  see  very  close  agreement  with 
the  exact  value  for  the  algorithm  of  section  8  (which  uses  the  filtering  approach  from  section  7.2), 
whereas  the  FD  derivatives  have  significant  errors,  especially  the  AR  parameters.  If  we  look  more 
closely  at  the  derivatives  for  parameter  oi  (figure  6),  we  see  a  dramatic  improvement  when  the 
filtering  approach  is  used. 
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Figure  3:  Comparison  of  log-PDF  values  for  24  trials  using  FD  approximation.  Left  panel:  approx¬ 
imate  vs.  exact  log-PDF.  Right  panel:  approximation  error  vs.  exact  log-PDF. 


Figure  4:  Comparison  of  log-PDF  derivatives  values  for  24  trials  using  filtering  approximation.  In 
each  panel  the  exact  derivative,  determined  numerically  from  equation  (10),  is  plotted  on  the  X 
axis  and  the  approximation  is  plotted  on  the  Y  axis. 
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Figure  5:  Comparison  of  log-PDF  derivatives  values  for  24  trials  using  FD  approximation.  In  each 
panel  the  exact  derivative,  determined  numerically  from  equation  (10),  is  plotted  on  the  X  axis  and 
the  approximation  is  plotted  on  the  Y  axis. 
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Figure  6;  Log-PDF  derivative  errors  for  FD  method  (circles)  and  filtering  method  (asterix). 


28 


total  samples  (N) 


Figure  7:  Comparison  of  execution  times  for  equation  (10)  (triangles)  and  the  procedure  in  section 
8  (circles)  as  a  function  of  N. 

To  compare  execution  times,  we  compare  the  time  to  execute  the  procedure  in  section  8  with 
equation  (10)  as  a  function  of  the  total  number  of  samples  N.  The  results  are  shown  in  figure  7. 

In  the  second  experiment,  we  demonstrated  the  parameter  estimation  accuracy.  The  parameters 
we  used  are:  M  =  128  samples,  N/M  =  60  segments,  =  60,  =  5,  17  =  56.2,  /r  =  30.5,  and  an 

order-4  AR  model  with 

a=  [1.0000  -1.2124  1.2125  -0.8760  0.3540] 

An  example  of  simulated  data  is  shown  in  Figure  8.  To  demonstrate  the  whitening  process,  the 
spectrogram  of  the  concatenated  whitened  samples  wt^i,  1  <  i  <  N/M,  1  <  t  <  M  is  also  shown. 
The  true  values  of  the  parameters  were  not  used  in  the  simulation  except  to  create  the  data. 
Initial  parameter  estimates  were  obtained  by  an  ad-hoc  means,  then  used  as  a  starting  point  in  the 
algorithm.  A  typical  maximum-likelihood  convergence  cycle  is  as  follows: 

lpX(l)=-18773. 774317,  del=9058 . 229974,  step=0 . 500000 
lpX(2)=-18294. 994823,  del=478 . 779494,  step=l . 000000 
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Figure  8:  Example  of  simulated  data.  Top  frame:  theoretical  power  spectrum  as  a  function  of 
segment.  Center  panel:  example  of  spectrogram  of  data.  Bottom  panel:  spectrogram  of  whitened 
data  wt^i  obtained  from  time-varying  ARMA  filter  in  accordance  with  true  parameter  values. 
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lpX(3)=-18290. 429420,  del=4. 565404,  step=l . 000000 
lpX(4)=-18274. 643158,  del=15 . 786262 ,  step=l . 000000 
lpX(5)=-18274. 241121,  del=0 . 402037 ,  step=l . 000000 
lpX(6)=-18274. 239577,  del=0 . 001544,  step=l . 000000 
lpX(7)=-18274. 239531,  del=0 . 000046 ,  step=l . 000000 
lpX(8)=-18274. 239530,  del=0 . 000001 ,  step=l . 000000 
lpX(9)=-18274. 239530,  del=0 . 000000 ,  step=l . 000000 


where  IpX(i)  is  the  log-PDF  value  for  iteration  i,  del  is  the  amount  of  increase  in  IpX,  and  step 
is  the  step  size  a  multiplicative  factor  that  is  normally  1.  Notice  the  rapid  convergence,  decreasing 
del  by  over  an  order  of  magnitude  per  step.  This  is  a  sign  that  the  derivatives  as  well  as  FIM  are 
correct. 

A  better  indication  that  the  algorithm  is  working  can  be  had  by  comparing  simulated  parameter 
covariance  with  the  CR  bound.  We  created  1000  examples  of  the  data  record.  On  each  record,  we 
iterated  to  obtain  the  ML  parameter  estimates.  The  parameters  0  had  dimension  8  and  consisted 
of 

0  =  [fT^,ai,a2,a3,a4,(T^,F,^]. 

The  average  parameter  estimates  of  1000  trials  are 
True  parameter  values 

[5.0000  -1.2124  1.2125  -0.8760  0.3540  60.0000  56.2000  30.5000] 

Average  of  1000  trials: 

[5.0033  -1.2097  1.2081  -0.8734  0.3518  59.7153  56.5399  30.5487] 

The  mean  and  covariance  of  the  estimates  are  shown  below  along  with  the  true  values  and  the 
CR  bound.  Note  that  unlike  the  likelihood  function  and  its  first  derivatives,  the  FIM  (CR  bound) 
does  not  depend  on  the  data  dircetly;  it  depends  only  on  the  parameters.  If  the  true  parameter 
values  are  substituted  in,  the  result  is  the  theoretical  FIM  and  CR  bound,  although  we  used  the 
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FD  approximation.  In  figure  9,  we  compare  the  empirical  parameter  estimation  error  covariance 
with  the  inverse  of  the  theoretical  FIM.  The  match  is  quite  good.  One  half  of  the  log  determinant 
of  the  FIM  is  a  component  of  the  denominator  of  the  J-function.  Note  that  the  two  matrices  above 
have  half-log-determinats  of  -15.55  and  -15.12,  respectively. 

10  Conclusion 

We  have  presented  a  model  for  autoregressive  processes  with  time- varying  amplitude  in  white  Gaus¬ 
sian  noise.  Because  the  exact  theoretical  implemetation  of  the  probability  density  function  (PDF) 
is  cumbersome,  we  have  derived  a  very  accurate  and  efficient  filter-based  implementation  for  use  in 
a  maximum  likelihood  (ML)  framework  or  in  a  class-specific  classiher.  The  key  simplifying  assump¬ 
tion  is  that  the  amplitude  varies  relatively  slowly  so  that  for  a  given  time,  M  samples,  the  process 
can  be  regarded  as  fixed  and  a  linear  shift-invariant  filter  can  whiten  the  data.  The  model  uses 
the  most  compact  parameterization  and  the  likelihood  function  is  computed  very  efficiently  using 
filtering.  At  each  M-sample  segment,  the  whitening  hlter  is  recalculated  and  the  filter  continues. 
The  filtering  approach  is  extended  to  the  calculation  of  the  log-likelihood  derivatives  which  are 
essential  in  order  to  iterate  to  obtain  the  ML  estimates.  We  demonstrated  maximum  likelihood 
estimation  on  simulated  data.  Results  obtained  using  an  efficient  filtering  method  are  compared 
with  the  exact  formulas  and  show  not  only  very  close  agreement  but  orders  of  magnitude  lower 
processing  requirements.  As  a  final  check,  empirical  parameter  estimation  covariance  is  compared 
with  and  agrees  closely  with  the  Cramer-Rao  (CR)  lower  bound. 
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Figure  9;  Comparison  of  empirical  parameter  estimation  error  covariance  matrix  with  the  CR  bound 
(inverse  of  the  average  FIM).  Top  graph:  comparison  of  diagonal  elements.  Solid  line:  empirical 
covariance,  dotted  line:  CR  bound.  Bottom  panels:  CR  bound  and  empirical  covariance  matrices 
normalized  and  displayed  as  an  image.  Both  the  CR  bound  matrix  and  the  empirical  covariance 
were  normalized  by  a  diagonal  similarity  transformation  to  which  resulted  in  a  constant  diagonal 
for  the  CR  bound  matrix. 
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